{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chapter 6: Ensemble Methods\n",
    "___\n",
    "\n",
    "## Exercises"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**6.1** Why is bagging based on random sampling with replacement? Would bagging still reduce a forecast’s variance if sampling were without replacement?\n",
    "\n",
    "**6.2** Suppose that your training set is based on highly overlap labels (i.e., with low uniqueness, as defined in Chapter 4).\n",
    "- **(a)** Does this make bagging prone to overfitting, or just ineffective? Why?\n",
    "- **(b)** Is out-of-bag accuracy generally reliable in financial applications? Why?\n",
    "\n",
    "**6.3** Build an ensemble of estimators, where the base estimator is a decision tree.\n",
    "- **(a)** How is this ensemble different from an RF?\n",
    "- **(b)** Using sklearn, produce a bagging classifier that behaves like an RF. What parameters did you have to set up, and how?\n",
    "\n",
    "**6.4**  Consider the relation between an RF, the number of trees it is composed of, and\n",
    "the number of features utilized:\n",
    "- **(a)** Could you envision a relation between the minimum number of trees needed in an RF and the number of features utilized?\n",
    "- **(b)** Could the number of trees be too small for the number of features used?\n",
    "- **(c)** Could the number of trees be too high for the number of observations available?\n",
    "\n",
    "**6.5**  How is out-of-bag accuracy different from stratified k-fold (with shuffling) cross validation accuracy?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Code Snippets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "SNIPPET 6.1 ACCURACY OF THE BAGGING CLASSIFIER"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Acc clasifier: 0.33, ACC bagging: 0.48\n"
     ]
    }
   ],
   "source": [
    "from scipy.special import comb\n",
    "N,p,k=100,1./3,3.\n",
    "p_=0\n",
    "for i in range(0,int(N/k)+1):\n",
    "    p_+=comb(N,i)*p**i*(1-p)**(N-i)\n",
    "print(f\"Acc clasifier: {p:.{2}}, ACC bagging: {(1-p_):.{2}}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
